Skip to content

Checkpoints V2: add migration option#855

Merged
computermode merged 31 commits intomainfrom
add-migrate-v2-command
Apr 8, 2026
Merged

Checkpoints V2: add migration option#855
computermode merged 31 commits intomainfrom
add-migrate-v2-command

Conversation

@computermode
Copy link
Copy Markdown
Contributor

@computermode computermode commented Apr 4, 2026

Adds a (hidden) migration CLI command that allows for a checkpoints parameter to be passed in to migrate from "v1" to "v2". I think this will be automated for end users when we are ready to green light v2, but for now, it's handy for testing.

Follows the migrate command validation for a test repo as written in https://github.com/entireio/cli/pull/839/changes#diff-f8101f182954049d140980fd56caf1e09e5c85a771f21c972d986ce2229d7e6eR439.

Validated that for checkpoints with a transcript.jsonl in v1, the full.jsonl + transcript.jsonl files were created in the proper places for v2.
Rerunning the command skips checkpoints that are already migrated + shows which transcript.jsonl files couldn't be created.

➜  test-repo git:(entire/checkpoints/v1) ✗ git show-ref -- refs/entire/checkpoints/v2/main
c1641139fb51a64faec6434e804368e8d8006e00 refs/entire/checkpoints/v2/main

➜  test-repo git:(entire/checkpoints/v1) ✗ git show-ref -- refs/entire/checkpoints/v2/full/current
faf645d2b1f21aa6e019f6252a7d5d521a7acb4e refs/entire/checkpoints/v2/full/current

Migrating v1 checkpoints to v2...
  [1/25] Migrating checkpoint 528c39637ed4... skipped (already in v2)
  [2/25] Migrating checkpoint e8e062073afb... skipped (already in v2)
  [3/25] Migrating checkpoint 3ced2b94d513... skipped (already in v2)
  [4/25] Migrating checkpoint 223e6f793eed... skipped (already in v2)
  [5/25] Migrating checkpoint 699504f045e2... in v2, but transcript.jsonl could not be generated: agent "Copilot CLI"
  [6/25] Migrating checkpoint a0f8612fcb64... added transcript.jsonl for 1 session(s)
  [7/25] Migrating checkpoint f55b1d3b434a... skipped (already in v2)
  [8/25] Migrating checkpoint 943126df1f2a... skipped (already in v2)
  [9/25] Migrating checkpoint e6b2f769a273... skipped (already in v2)
  [10/25] Migrating checkpoint 4be62319bf98... skipped (already in v2)
  [11/25] Migrating checkpoint b8b6fbd5f55f... skipped (already in v2)
  [12/25] Migrating checkpoint b1b2d74d8ef1... added transcript.jsonl for 1 session(s)
  [13/25] Migrating checkpoint 6215534dfb8e... skipped (already in v2)
  [14/25] Migrating checkpoint 5a31697679ee... skipped (already in v2)
  [15/25] Migrating checkpoint 581d7b2848de... skipped (already in v2)
  [16/25] Migrating checkpoint 2151d9f18a50... skipped (already in v2)
  [17/25] Migrating checkpoint ecf782729563... skipped (already in v2)
  [18/25] Migrating checkpoint c81467b72ca0... skipped (already in v2)
  [19/25] Migrating checkpoint c8eb421389c0... skipped (already in v2)
  [20/25] Migrating checkpoint 14d75ebe3a6d... skipped (already in v2)
  [21/25] Migrating checkpoint 729ac91fce19... skipped (already in v2)
  [22/25] Migrating checkpoint 79d52455f160... skipped (already in v2)
  [23/25] Migrating checkpoint edd2df3466e5... skipped (already in v2)
  [24/25] Migrating checkpoint a18a21695d2e... skipped (already in v2)
  [25/25] Migrating checkpoint 6284d02bfdae... skipped (already in v2)

Inspecting an example transcript.jsonl file:

git show refs/entire/checkpoints/v2/main:f5/5b1d3b434a/0/transcript.jsonl

{"v":1,"agent":"Cursor","cli_version":"dev","type":"user","content":[{"text":"create a markdown file stating this is for testing cursor"}]}
{"v":1,"agent":"Cursor","cli_version":"dev","type":"assistant","content":[{"text":"The user wants me to create a markdown file stating this is for testing cursor. Let me create it in the workspace directory.","type":"text"},{"input":{"path":"/Users/ninawork/entire/test-repos/test-repo/cursor-attach/testing-cursor.md","contents":"# Testing Cursor\n\nThis file is for testing Cursor.\n"},"name":"Write","type":"tool_use"},{"text":"Created `testing-cursor.md` in the `cursor-attach` directory.","type":"text"}]}
{"v":1,"agent":"Cursor","cli_version":"dev","type":"user","content":[{"text":"how do I quit this cli"}]}
{"v":1,"agent":"Cursor","cli_version":"dev","type":"assistant","content":[{"text":"You can quit by pressing **Ctrl+C** or typing **exit**.","type":"text"}]}

Note

Medium Risk
Introduces new migration logic that writes to v2 checkpoint refs and performs git tree/commit surgery, which could affect checkpoint data integrity if bugs exist; command is hidden and guarded by an explicit flag, limiting user impact.

Overview
Adds a hidden entire migrate --checkpoints v2 command to bulk-migrate committed checkpoints from v1 storage into the v2 refs.

Migration iterates v1 checkpoints, writes each session into v2 (optionally generating transcript.jsonl via compaction), and is idempotent by skipping already-migrated checkpoints while backfilling missing compact transcripts when possible. For task checkpoints, it also copies task metadata trees into v2 /full/current via subtree updates and commits.

Separately standardizes prompt serialization by introducing PromptSeparator, JoinPrompts, and SplitPromptContent, switching existing v1/v2 checkpoint writers to use the shared join helper and adding focused tests for prompt round-tripping and migration behavior.

Reviewed by Cursor Bugbot for commit d7e367f. Configure here.

peyton-alt and others added 7 commits March 30, 2026 18:45
Pre-session dirty files (CLI config files from `entire enable`, leftover
changes from previous sessions) were incorrectly counted as human
contributions, deflating agent percentage.

Root cause: PA1 (first prompt attribution) captures worktree state at
session start. This data was used to correct agent line counts (correct)
but also added to human contributions (wrong).

Fix:
- Split prompt attributions into baseline (PA1) and session (PA2+)
- PA1 data still subtracted from agent work (correct agent calc)
- PA1 contributions excluded from relevantAccumulatedUser
- PA1 removals excluded from totalUserRemoved
- Include PendingPromptAttribution during condensation for agents
  that skip SaveStep (e.g., Codex mid-turn commits)
- Add .entire/ filter to attribution calc (matches existing PA filter)
- Fix wrapcheck lint errors in updateCombinedAttributionForCheckpoint

Verified end-to-end: 100% agent with config files committed alongside.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b0cb4216f6bc
…ibution

Checkpoint package changes required by the attribution baseline fix:
- PromptAttributionsJSON field on WriteCommittedOptions and CommittedMetadata
- UpdateCheckpointSummary method on GitStore for multi-session aggregation
- CombinedAttribution field on CheckpointSummary
- Preserve existing CombinedAttribution during summary rewrites

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: b8963737336c
…arentCommitHash

Fixes all 4 issues from Copilot and Cursor Bugbot review:

1. Precompute parentCommitHash on postCommitActionHandler struct
   using ParentHashes[0] (avoids extra object read, no silent error)
2. Remove duplicated 6-line parentCommitHash computation from
   HandleCondense and HandleCondenseIfFilesTouched
3. Thread parentTree through condenseOpts/attributionOpts and use it
   for non-agent file line counting — ensures diffLines uses parent→HEAD
   (consistent with parentCommitHash file scoping) instead of
   sessionBase→HEAD which over-counted intermediate commit changes
4. Add ParentTreeForNonAgentLines test proving the fix (TDD verified:
   HumanAdded=8 without fix → HumanAdded=3 with fix)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 12f5c4373467
Three fixes for multi-session attribution:

1. Cross-session file exclusion: Thread allAgentFiles (union of all
   sessions' FilesTouched) through the attribution pipeline. Files
   created by other agent sessions are no longer counted as human work.

2. Exclude .entire/ from commit session fallback: When the commit
   session has no FilesTouched and falls back to all committed files,
   filter out .entire/ metadata created by `entire enable`.

3. PA1 baseline uses base tree for new sessions: New sessions
   (StepCount == 0) always diff against the base commit tree, not
   the shared shadow branch which may contain other sessions' state.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 209a37190167
Copilot AI review requested due to automatic review settings April 4, 2026 00:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an initial entire migrate CLI command intended to migrate v1 checkpoints to the v2 checkpoint ref/layout for testing and rollout prep.

Changes:

  • Registers a new migrate subcommand on the root CLI.
  • Introduces v1→v2 checkpoint migration logic, including transcript compaction and attempted task-metadata tree copying.
  • Adds unit tests covering basic migration flows and idempotency.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
cmd/entire/cli/root.go Registers the new migrate command in the root CLI.
cmd/entire/cli/migrate.go Implements v1→v2 migration logic, transcript compaction, and task metadata tree splicing.
cmd/entire/cli/migrate_test.go Adds tests for migration behavior (basic/idempotent/multi-session/flag validation).

Soph and others added 3 commits April 5, 2026 17:11
Entire-Checkpoint: 3790cba265e6
Entire-Checkpoint: c9595c52ab4a
Base automatically changed from feat/checkpoints-v2-push-logic to main April 6, 2026 20:57
computermode and others added 9 commits April 6, 2026 15:04
Entire-Checkpoint: 9f07aeebbf93
Entire-Checkpoint: f1c37c8efc47
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tering

- Test AllAgentFiles cross-session exclusion in CalculateAttributionWithAccumulated
- Test committedFilesExcludingMetadata filters .entire/ paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The combined_attribution field now diffs parent→HEAD once and classifies
files as agent vs human based on the union of sessions with real
checkpoints (SaveStep ran). Filters .entire/ and .claude/ config paths.

Also adds ReadSessionMetadata for lightweight per-session metadata reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gtrrz-victor and others added 5 commits April 7, 2026 16:40
…mmit-inflation

Fix attribution inflation from intermediate commits
don't show multiple spaces for codex single line start message rendering
Entire-Checkpoint: 36db97269a69
Entire-Checkpoint: 93066e1dac3c
Entire-Checkpoint: 4fdb72622b7f
@computermode
Copy link
Copy Markdown
Contributor Author

bugbot run

@computermode computermode changed the title WIP: Checkpoints V2: add migration option Checkpoints V2: add migration option Apr 7, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Root-level tasks overwritten by per-session tasks splice
    • copyTaskMetadataToV2 now merges root-level and latest-session task trees before splicing so both task sets are preserved instead of one overwriting the other.

Create PR

Or push these changes by commenting:

@cursor push 033e8c1ad0
Preview (033e8c1ad0)
diff --git a/cmd/entire/cli/migrate.go b/cmd/entire/cli/migrate.go
--- a/cmd/entire/cli/migrate.go
+++ b/cmd/entire/cli/migrate.go
@@ -359,14 +359,18 @@
 		return err
 	}
 
+	latestSessionIdx := -1
+	if len(summary.Sessions) > 0 {
+		latestSessionIdx = len(summary.Sessions) - 1
+	}
+
 	// Legacy v1 layout stores task metadata at checkpoint root: <cp>/tasks/<tool-use-id>/...
-	// Prefer attaching this tree to the latest session in v2.
-	if rootTasksTree, rootTasksErr := v1Tree.Tree("tasks"); rootTasksErr == nil {
-		if len(summary.Sessions) > 0 {
-			latestSessionIdx := len(summary.Sessions) - 1
-			if spliceErr := spliceTasksTreeToV2(repo, v2Store, cpID, latestSessionIdx, rootTasksTree.Hash); spliceErr != nil {
-				return fmt.Errorf("latest session task tree splice failed: %w", spliceErr)
-			}
+	// Attach this to the latest session in v2, and merge with that session's own tasks if present.
+	var rootTasksTree *object.Tree
+	rootTasksSpliced := false
+	if latestSessionIdx >= 0 {
+		if tasksTree, rootTasksErr := v1Tree.Tree("tasks"); rootTasksErr == nil {
+			rootTasksTree = tasksTree
 		}
 	}
 
@@ -382,11 +386,33 @@
 			continue // No tasks directory in this session
 		}
 
-		if spliceErr := spliceTasksTreeToV2(repo, v2Store, cpID, sessionIdx, tasksTree.Hash); spliceErr != nil {
+		tasksTreeHash := tasksTree.Hash
+		if rootTasksTree != nil && sessionIdx == latestSessionIdx {
+			mergedTasksTreeHash, mergeErr := checkpoint.UpdateSubtree(
+				repo,
+				rootTasksTree.Hash,
+				nil,
+				tasksTree.Entries,
+				checkpoint.UpdateSubtreeOptions{MergeMode: checkpoint.MergeKeepExisting},
+			)
+			if mergeErr != nil {
+				return fmt.Errorf("failed to merge root and session task trees for session %d: %w", sessionIdx, mergeErr)
+			}
+			tasksTreeHash = mergedTasksTreeHash
+			rootTasksSpliced = true
+		}
+
+		if spliceErr := spliceTasksTreeToV2(repo, v2Store, cpID, sessionIdx, tasksTreeHash); spliceErr != nil {
 			return fmt.Errorf("session %d task tree splice failed: %w", sessionIdx, spliceErr)
 		}
 	}
 
+	if rootTasksTree != nil && !rootTasksSpliced {
+		if spliceErr := spliceTasksTreeToV2(repo, v2Store, cpID, latestSessionIdx, rootTasksTree.Hash); spliceErr != nil {
+			return fmt.Errorf("latest session task tree splice failed: %w", spliceErr)
+		}
+	}
+
 	return nil
 }
 

diff --git a/cmd/entire/cli/migrate_test.go b/cmd/entire/cli/migrate_test.go
--- a/cmd/entire/cli/migrate_test.go
+++ b/cmd/entire/cli/migrate_test.go
@@ -228,6 +228,52 @@
 	require.NoError(t, taskFileErr, "expected migrated task checkpoint metadata in /full/current")
 }
 
+func TestMigrateCheckpointsV2_TaskMetadataMergesRootAndSessionTasks(t *testing.T) {
+	t.Parallel()
+	repo := initMigrateTestRepo(t)
+	v1Store, v2Store := newMigrateStores(repo)
+
+	cpID := id.MustCheckpointID("c1d2e3f4a5b6")
+
+	metadataDir := t.TempDir()
+	sessionTaskFile := filepath.Join(metadataDir, "tasks", "toolu_01SESSION", "checkpoint.json")
+	require.NoError(t, os.MkdirAll(filepath.Dir(sessionTaskFile), 0o755))
+	require.NoError(t, os.WriteFile(sessionTaskFile, []byte(`{"source":"session"}`), 0o644))
+
+	// Write one v1 task checkpoint that has both:
+	// 1) root-level task metadata (legacy layout, from IsTask/ToolUseID)
+	// 2) session-level task metadata (from MetadataDir copy into session subtree)
+	err := v1Store.WriteCommitted(context.Background(), checkpoint.WriteCommittedOptions{
+		CheckpointID: cpID,
+		SessionID:    "session-task-merge-001",
+		Strategy:     "manual-commit",
+		Transcript:   []byte("{\"type\":\"assistant\",\"message\":\"task merge\"}\n"),
+		Prompts:      []string{"task merge prompt"},
+		IsTask:       true,
+		ToolUseID:    "toolu_01ROOT",
+		MetadataDir:  metadataDir,
+		AuthorName:   "Test",
+		AuthorEmail:  "test@test.com",
+	})
+	require.NoError(t, err)
+
+	var stdout bytes.Buffer
+	result, migrateErr := migrateCheckpointsV2(context.Background(), repo, v1Store, v2Store, &stdout)
+	require.NoError(t, migrateErr)
+	assert.Equal(t, 1, result.migrated)
+
+	_, rootTreeHash, refErr := v2Store.GetRefState(plumbing.ReferenceName(paths.V2FullCurrentRefName))
+	require.NoError(t, refErr)
+	rootTree, treeErr := repo.TreeObject(rootTreeHash)
+	require.NoError(t, treeErr)
+
+	// Both root-level and per-session tasks must exist after migration.
+	_, rootTaskErr := rootTree.File(cpID.Path() + "/0/tasks/toolu_01ROOT/checkpoint.json")
+	require.NoError(t, rootTaskErr, "expected root-level task metadata in /full/current")
+	_, sessionTaskErr := rootTree.File(cpID.Path() + "/0/tasks/toolu_01SESSION/checkpoint.json")
+	require.NoError(t, sessionTaskErr, "expected session-level task metadata in /full/current")
+}
+
 func TestMigrateCheckpointsV2_AllSkippedOnRerun(t *testing.T) {
 	t.Parallel()
 	repo := initMigrateTestRepo(t)

This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit d7e367f. Configure here.

@computermode computermode marked this pull request as ready for review April 7, 2026 20:55
@computermode computermode requested a review from a team as a code owner April 7, 2026 20:55
Entire-Checkpoint: 730e93f6b572
@computermode computermode enabled auto-merge April 8, 2026 18:56
@computermode computermode merged commit 8ebb929 into main Apr 8, 2026
3 checks passed
@computermode computermode deleted the add-migrate-v2-command branch April 8, 2026 19:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants